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□ Use of (staggered) Jong tracks" 

■ Increases operating frequency 

■ Decreases transistor count and area 

■ Optimizes metalization layers 

■ Decreases power dissipation 




□Three additional strategies: 

■ Operate at different frequencies, i.e.: 

• MIPS MP Core at 400 MHz 

• PACT XPP Core at 200 MHz 

■ Registers in each bus-connect 

■ PAEs act sequentially, one output each 2nd or 4th clock cycle 
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■■ □Tradeoff between area and sequentiallity 

PACT^- i ' Bas* 0 operation of compiler technology can be used to achieve 

j abstraction layer 

;\ \-}:% ■ Generate .compressed" configurations, which are expanded on 

larger arrays while loading 
■ No minimum array but maximum array defined by compiler setting 
• Tradeoff: Number of (re-)configurations vs. usable ALU-PAEs 
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□ Basic Operation Method 

■ LOAD/STORE processor 

■ RAM-PAEs act as Vector-Registers (2D/3D) 

■ Irregular data access patterns are linearized by 
LOAD/STORE while accessing RAM-PAEs 

- Can be done by uP also! 

■ LOAD ... Cont, ... Conf; Conf,... STORE 

■ Each Configuration is regarded as an OpCode 

■ No Configuration/Array internal status 

□ Code Analysis 

■ Data Dependency Analysis 

■ Data Flow Analysis 

■ Interprocedural Alias Analysis 

• Pointer analysis: statically allocated data, dynamically allocated data 

■ Interprocedural Value Range Analysis 
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a Code Optimizations 
• Loop Transformations 

• Loop Normalization 

• Loop Reversal 

• Loop-Invariant Code Motion 
- Loop Unswitching 

• Loop interchange 

• Loop Tiling 

• Loop Skewing 

• Loop Coalescing/Collapsing 

• Loop Fusion 

• Loop Distribution 

• Loop Unrolling 

• Loop Peeling 

• Loop Splitting 

• Loop Pushing/Embedding 
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• Strength Reduction 

• Induction Variable Elemination 
- Strip Mining 

• Scalar Expansion 

• Array Contracting/Shrinking 

• Scalar Replacement 

• Reduction Recognition 

• Idiom Recognition 

• Procedure Inlining 

• Software Pipelining 

• Vector Statement Generation 

• Node Splitting 

• If Conversion 

• Statement Reordering 




□ RAM-PAEs RAM is .embedded" into cache 
Q RAM-PAEs can operate like cache-lines 

■ Homogeneous embedded in cache 

■ Handling access rights between uP and XPP 

■ Handling context switching / hyperthreading 

■ Abstracting non linear address patterns 
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□ XPP operates like an RISC-Processor 

□ RAM-PAEs act like registers 

□ Each configuration is atomar (unbreakable) 

□ Configurations running time is limited 

□ LOAD Configuration 

■ Loads external data into internal RAM-PAEs 

□ Oata operations (one or multiple configurations) 

• Unbreakable - no internal status to be saved! 

□ STORE Configuration 

■ Stores internal data into external RAM-PAEs 

□ Interrupts (Task/Thread-Switches) only between 
(re)configu rations not at runtime 
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□XPP Technology allows sequential 
processing 

■ Within ALU-PAEs using the configuration register fOe as a random 

access code memory 
• Coupling an ALU-PAE with a RAM-PAE. ALU-PAE acts like a uC, 
RAM-PAE is according Data- and Code-Memory 
- As an enhancement lO-PAEs can be used to access peripherials and 
external memory 
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m_ □ Optimum trade-off between sequencing and dataflow 

p^pf?; processing 

" \ i 

Configurable ALU-PAE / RAM-PAE Sequencers 

IIPSI 



•' .•.•.4- 





ill! 
11 II 



illfill 



□ Handled by sequential processing within PAEs 

■ I.e. Floating-Point, Division etc can be emulated by 
sequential multicycle PAE operations 

■ Higher precision is calculated as a multicycle operation 

• results are transferee! in two bus cycles 
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